This commit implements the F-beta score metric #1543

Yuri-Albuquerque · 2024-10-20T21:46:09Z

for the AnswerCorrectness class. The beta parameter is introduced to control the relative importance of recall and precision when calculating the score. Specifically:

beta > 1 places more emphasis on recall.
beta < 1 favors precision.
beta ==1 stands for the regular F1 score that can be interpreted as a harmonic mean of the precision and recall.

Key Changes:
The method _compute_statement_presence is updated to calculate the F-beta score based on true positives (TP), false positives (FP), and false negatives (FN).

This ensures that we can balance between recall and precision, depending on the task's requirements, by tuning the beta value.

source: https://scikit-learn.org/1.5/modules/generated/sklearn.metrics.fbeta_score.html

for the AnswerCorrectness class.

shahules786

Hey, this is useful. From ragas 0.2 onwards, we have factual correctness score - can you also add this to it?

ragas/src/ragas/metrics/_factual_correctness.py

Line 262 in 791767d

score = 2 * (precision * recall) / (precision + recall + 1e-8)

Yuri-Albuquerque · 2024-10-21T14:27:07Z

Hey, this is useful. From ragas 0.2 onwards, we have factual correctness score - can you also add this to it?

ragas/src/ragas/metrics/_factual_correctness.py

Line 262 in 791767d

score = 2 * (precision * recall) / (precision + recall + 1e-8)

oh thank sr. I'll do it now.

_factual_correctness, which is a weighted harmonic mean of precision and recall, where the recall is weighted by a factor of beta. The F-beta score is defined as: F-beta = (1 + beta^2) * (precision * recall) / (beta^2 * precision + recall) The F-beta score is a generalization of the F1 score, where beta = 1.0. The F1 score is the harmonic mean of precision and recall, and is defined as: F1 = 2 * (precision * recall) / (precision + recall)

calculation in factual correctness and keeping the f1 score as f1-beta score as requested.

shahules786 · 2024-10-22T15:31:48Z

Hey @Yuri-Albuquerque thanks for the change.
Since F1 score is repeatedly used in multiple metrics, I think we should have it defined in utils and reuse everywhere else to avoid code duplication. Let me take care of this , and will merge this PR. Thanks for your contribution.

Yuri-Albuquerque · 2024-10-22T16:13:43Z

Hey @Yuri-Albuquerque thanks for the change. Since F1 score is repeatedly used in multiple metrics, I think we should have it defined in utils and reuse everywhere else to avoid code duplication. Let me take care of this , and will merge this PR. Thanks for your contribution.

Absolutely, @shahules786, feel free to take over this merge! I completely agree that this function belongs in utils. I'm still learning how to contribute effectively to this project, so I appreciate your guidance. I work at one of the largest private banks in Brazil (Itaú SA), and we're using RAGAS to evaluate various features in our chatbot. The "F1-beta" metric is something we've been wanting for a long time.
Thanks for your attention

shahules786

Hey @Yuri-Albuquerque just made the changes from my end. Thanks a lot :)

This commit implements the F-beta score metric

e1f0ca8

for the AnswerCorrectness class.

dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Oct 20, 2024

shahules786 self-requested a review October 21, 2024 14:08

shahules786 requested changes Oct 21, 2024

View reviewed changes

dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Oct 21, 2024

returning to the original recall and precision

ed3b3f3

calculation in factual correctness and keeping the f1 score as f1-beta score as requested.

dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Oct 22, 2024

Yuri-Albuquerque requested a review from shahules786 October 22, 2024 11:52

shahules786 added 2 commits October 25, 2024 23:30

remove ALL_METRICS

92166c0

add fbeta score

69cfa99

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Oct 25, 2024

shahules786 added 2 commits October 25, 2024 23:34

replace by fbeta score

a0f6917

removed unused imports

c432b19

shahules786 approved these changes Oct 25, 2024

View reviewed changes

Merge branch 'main' into feature/f-beta-score

8d5050c

shahules786 merged commit 6d114e5 into explodinggradients:main Oct 25, 2024
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This commit implements the F-beta score metric #1543

This commit implements the F-beta score metric #1543

Yuri-Albuquerque commented Oct 20, 2024

shahules786 left a comment

Yuri-Albuquerque commented Oct 21, 2024

shahules786 commented Oct 22, 2024

Yuri-Albuquerque commented Oct 22, 2024

shahules786 left a comment

This commit implements the F-beta score metric #1543

This commit implements the F-beta score metric #1543

Conversation

Yuri-Albuquerque commented Oct 20, 2024

shahules786 left a comment

Choose a reason for hiding this comment

Yuri-Albuquerque commented Oct 21, 2024

shahules786 commented Oct 22, 2024

Yuri-Albuquerque commented Oct 22, 2024

shahules786 left a comment

Choose a reason for hiding this comment